Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

نویسندگان

چکیده

This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the quality, intelligibility, and distortion scores of an input signal. MOSA-Net comprises convolutional neural network bidirectional long short-term memory architecture for representation extraction, multiplicative attention layer fully connected each metric prediction. Additionally, features (spectral time-domain features) latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information obtain more accurate assessments. Experimental results show that in both seen unseen noise environments, improve linear correlation coefficient (LCC) perceptual evaluation quality (PESQ) prediction, compared Quality-Net, existing single-task model PESQ LCC short-time objective intelligibility (STOI) STOI-Net, STOI Moreover, be pre-trained effectively adapted predicting subjective with limited amount training data. mean opinion score (MOS) predictions, MOS-SSL, strong MOS We further adopt guide enhancement (SE) process derive quality-intelligibility (QI)-aware SE (QIA-SE) approach. QIA-SE outperforms baseline system improved environments over model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perceptually-based objective measure for non-intrusive speech quality assessment

This paper proposes a new perceptuallybased method for assessing speech quality and evaluates its performance. The method is based on comparing the received speech to an appropriate reference representing the closest match from a preformulated codebook. The codebook holds a number of optimally clustered speech parameter vectors extracted from a large number of various undistorted clean speech r...

متن کامل

Output-Based Objective Measure for Non-Intrusive Speech Quality Evaluation

This paper describes a newly developed output-based method for non-intrusive evaluation of speech quality of voice communication systems, and evaluates its performance. The method, which uses only the output of the system, is based on measuring perceptually motivated objective auditory distances between the voiced parts of the speech signal whose quality to be evaluated to appropriately matchin...

متن کامل

Non-intrusive Speech Quality Assessment in Simplified E-Model

The E-model brings a modern approach to the computation of estimated quality, allowing for easy implementation. One of its advantages is that it can be applied in real time. The method is based on a mathematical computation model evaluating transmission path impairments influencing speech signal, especially delays and packet losses. These parameters, common in an IP network, can affect speech q...

متن کامل

Non-Intrusive SOM-Based Speech Quality Assessment for Telephony Applications

A non-intrusive method for speech quality assessment in telephony applications is proposed and its performance evaluated. The method involves measuring perception-based objective auditory distances between the voiced parts of the processed (degraded) speech signal to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering large ...

متن کامل

Multi-Objective Deep Reinforcement Learning

We propose Deep Optimistic Linear Support Learning (DOL) to solve highdimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the fir...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2023

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2022.3205757